Summary of web crawler technology research

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Research on Model of Network Information Extraction Based on Improved Topic-focused Web Crawler Key Technology

Original scientific paper This research has caught researchers' wide attention for extracting network information exactly with the arrival of the big data era characterized by semistructured or unstructured text. This paper proposes a model of network information extraction based on improved topic-focused web crawler key technology taking Web news as object of extraction. The authors elaborate ...

متن کامل

A Web Crawler System Design Based on Distributed Technology

A practical distributed web crawler architecture is designed. The distributed cooperative grasping algorithm is put forward to solve the problem of distributed Web Crawler grasping. Log structure and Hash structure are combined and a large-scale web store structure is devised, which can meet not only the need of a large amount of random accesses, but also the need of newly added pages. Experime...

متن کامل

World Wide Web Crawler

We describe our ongoing work on world wide web crawling, a scalable web crawler architecture that can use resources distributed world-wide. The architecture allows us to use loosely managed compute nodes (PCs connected to the Internet), and may save network bandwidth significantly. In this poster, we discuss why such architecture is necessary, point out difficulties in designing such architectu...

متن کامل

Reinforcement-Based Web Crawler

This paper presents a focused web crawler system which automatically creates a minority language corpora. The system uses a database of relevant and irrelevant documents testing the relevance of retrieved web documents. The system requires a starting web document to indicate where the search would begin.

متن کامل

Web Crawler Architecture

Definition A web crawler is a program that, given one or more seed URLs, downloads the web pages associated with these URLs, extracts any hyperlinks contained in them, and recursively continues to download the web pages identified by these hyperlinks. Web crawlers are an important component of web search engines, where they are used to collect the corpus of web pages indexed by the search engin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Physics: Conference Series

سال: 2020

ISSN: 1742-6588,1742-6596

DOI: 10.1088/1742-6596/1449/1/012036